Discriminative Feature Grouping
نویسندگان
چکیده
Feature grouping has been demonstrated to be promising in learning with high-dimensional data. It helps reduce the variances in the estimation and improves the stability of feature selection. One major limitation of existing feature grouping approaches is that some similar but different feature groups are often mis-fused, leading to impaired performance. In this paper, we propose a Discriminative Feature Grouping (DFG) method to discover the feature groups with enhanced discrimination. Different from existing methods, DFG adopts a novel regularizer for the feature coefficients to tradeoff between fusing and discriminating feature groups. The proposed regularizer consists of a 1 norm to enforce feature sparsity and a pairwise ∞ norm to encourage the absolute differences among any three feature coefficients to be similar. To achieve better asymptotic property, we generalize the proposed regularizer to an adaptive one where the feature coefficients are weighted based on the solution of some estimator with root-n consistency. For optimization, we employ the alternating direction method of multipliers to solve the proposed methods efficiently. Experimental results on synthetic and real-world datasets demonstrate that the proposed methods have good performance compared with the state-of-the-art feature grouping methods. Introduction Learning with high-dimensional data is a challenge especially when the size of the data is not very large. Sparse modeling, which selects only a relevant subset of the features, has thus received increasing attention. Lasso (Tibshirani 1996) is one of the most popular sparse modeling methods and has been well studied in the literature. However, in the presence of highly correlated features, Lasso tends to select only one or some of those features, leading to unstable estimations and impaired performance. To address this issue, the group lasso (Yuan and Lin 2006) has been proposed to select groups of features by using the 1/ 2 regularizer. As extensions of the group lasso, several methods are proposed to learn from overlapping groups (Zhao, Rocha, and Yu 2009; Jacob, Obozinski, and Vert 2009; Yuan, Liu, and Ye 2011). ∗Both authors contribute equally. Copyright c © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Other extensions of the group lasso, e.g., (Kim and Xing 2010; Jenatton et al. 2010), aim to learn from the given tree structured information among features. However, those methods require the feature groups to be given as a priori information. That is, they can utilize the given feature groups to obtain solutions with group sparsity, but lack the ability of learning the feature groups. Feature grouping techniques, which find the groups of highly correlated features automatically from data, thus have been proposed to address this issue. These techniques help gain additional insights to understand and interpret data, e.g., finding co-regulated genes in microarray analysis (Dettling and Bühlmann 2004). Feature grouping techniques assume that the features with identical coefficients form a feature group. The elastic net (Zou and Hastie 2005) is a representative feature grouping approach, which combines the 1 and 2 norms to encourage highly correlated features to have identical coefficients. The fused Lasso family, including the fused Lasso (Tibshirani et al. 2005), graph based fused Lasso (Kim and Xing 2009), and generalized fused Lasso (GFLasso) (Friedman et al. 2007), uses some fused regularizers to directly force the feature coefficients of each pair of features to be close based on the 1 norm. Recently, the OSCAR method (Bondell and Reich 2008), which combines a 1 norm and a pairwise ∞ norm on each pair of features, has shown good performance in learning feature groups. Moreover, some extensions of OSCAR have also been proposed (Shen and Huang 2010; Yang et al. 2012; Jang et al. 2013) to further reduce the estimation bias. However, when there exist some similar but still different feature groups, we find that empirically all the existing feature grouping methods tend to fuse those groups together as one group, thus leading to impaired learning performance. Figure 1(a) shows an example, where G1 and G2 are similar but different feature groups, and they are easy to be mis-fused by existing feature grouping methods. In many real-world applications with high-dimensional data, e.g., microarray analysis, the phenomena that feature groups with similar but different feature coefficients appear frequently. For example, by using the method in (Jacob, Obozinski, and Vert 2009), the averaged coefficients of each feature group among the given 637 groups, which correspond to the biological gene pathways, in the breast cancer data is shown in Figure 1(b) and we can observe that there are a lot of feature Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence
منابع مشابه
دو روش تبدیل ویژگی مبتنی بر الگوریتم های ژنتیک برای کاهش خطای دسته بندی ماشین بردار پشتیبان
Discriminative methods are used for increasing pattern recognition and classification accuracy. These methods can be used as discriminant transformations applied to features or they can be used as discriminative learning algorithms for the classifiers. Usually, discriminative transformations criteria are different from the criteria of discriminant classifiers training or their error. In this ...
متن کاملUnsupervised Feature Selection for Relation Extraction
This paper presents an unsupervised relation extraction algorithm, which induces relations between entity pairs by grouping them into a “natural” number of clusters based on the similarity of their contexts. Stability-based criterion is used to automatically estimate the number of clusters. For removing noisy feature words in clustering procedure, feature selection is conducted by optimizing a ...
متن کاملDiscriminative Feature Metric Learning in the Affinity Propagation Model for Band Selection in Hyperspectral Images
Traditional supervised band selection (BS) methods mainly consider reducing the spectral redundancy to improve hyperspectral imagery (HSI) classification with class labels and pairwise constraints. A key observation is that pixels spatially close to each other in HSI have probably the same signature, while pixels further away from each other in the space have a high probability of belonging to ...
متن کاملImage Annotation by Input-Output Structural Grouping Sparsity
Automatic image annotation (AIA) is very important to image retrieval and image understanding. Two key issues in AIA are explored in detail in this paper, i.e., structured visual feature selection and the implementation of hierarchical correlated structures among multiple tags to boost the performance of image annotation. This paper simultaneously introduces an input and output structural group...
متن کاملPerceptual Grouping Enhances Visual Plasticity
Visual perceptual learning, a manifestation of neural plasticity, refers to improvements in performance on a visual task achieved by training. Attention is known to play an important role in perceptual learning, given that the observer's discriminative ability improves only for those stimulus feature that are attended. However, the distribution of attention can be severely constrained by percep...
متن کامل